Accommodating Dimension Hierarchies in a Data Warehouse View/index Selection Scheme
نویسنده
چکیده
Storing vast number of aggregate tables (materialized views) of the base data collected from its various independent data sources is one way warehousing systems provide fast access to data requested by complex warehouse queries. A data warehouse collects, stores and integrates large amounts of data from various function oriented databases over a long period of time which is used for online analytical processing (OLAP). In addition to storing views which project mostly on primary key attributes (e.g., customerid), materializing some of their indexes help reduce query response time at the expense of increasing maintenance cost for stored tables and diminishing storage space. Thus, in order to achieve near optimal query response time, maintenance cost and storage space utilization, schemes that enable careful selection of views and indexes are required. For an even better system performance, extending these selection schemes to accommodate views that are grouped on dimension attributes of subjects (e.g., customer name, phone) and not on their primary keys cuts oo the enormous amount of time spent performing joins with dimension tables. A data cube is an elegant way for representing information in a Warehouse and is an n-dimensional view with 2 n subviews. Most commercial OLAP systems today pick the summary tables and indexes to materialize in a trial-and-error manner. This paper extends a uniform technique for selecting subviews and indexes of a data cube to materialize, to also include selection of some dimension views. The objective is improve system performance by selecting the set of subviews (V) of the data cube, some of their dimension views (D), and some of their indexes (I) that have the best resultant query response time, maintenance cost taking into account their space usage and the warehouse query access pattern and usage.
منابع مشابه
بهبود الگوریتم انتخاب دید در پایگاه داده تحلیلی با استفاده از یافتن پرس وجوهای پرتکرار
A data warehouse is a source for storing historical data to support decision making. Usually analytic queries take much time. To solve response time problem it should be materialized some views to answer all queries in minimum response time. There are many solutions for view selection problems. The most appropriate solution for view selection is materializing frequent queries. Previously posed ...
متن کاملA Solution to View Management to Build a Data Warehouse
Several techniques exist to select and materialize a proper set of data in a suitable structure that manage the queries submitted to the online analytical processing systems. These techniques are called view management techniques, which consist of three research areas: 1) view selection to materialize, 2) query processing and rewriting using the materialized views, and 3) maintaining materializ...
متن کاملCooperative Heuristics for the Federated View Selection Problem
The federated view selection problem (FVSP) is an optimization technique designed to enhance query performance in a federated data warehouse environment through the materialization of select views given resource constraints and storage restrictions. Current research focuses on single-instance heuristics, which have difficulty scaling. In this work, we introduce two commonly used cooperative heu...
متن کاملSelection of Views to Materialize in
A data warehouse stores materialized views of data from one or more sources, with the purpose of eeciently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost...
متن کاملData Warehouse Snowflake Design and Performance Considerations in Business Analytics
Snowflake is a data warehouse schema design where dimension tables are normalized on top of a star schema design. Snowflake schema is generally not recommended due to its performance overhead in joining the normalized dimension tables. However, the Snowflake schema can be extended in a way to improve performance for business analysis activities. In business analytics paradigm, two distinct envi...
متن کامل